In this project, we will use a dataset from the 2012 national French census. The dataset give us information of the “Communes” and region with a population size exceeding 2000 individuals.
Our objective is to conduct an analysis of this data for the region
of Franche-Comté and Bourgogne. In the
following map, we can have a visualization of the departments of
Bourgogne and Franche-Compté. We will carry out an analysis of the
housing status of the residents with their age group, gender, level of
education, employment status and the housing type they live.
Burgundy, or Bourgogne in French, is a region located in the
eastern part of France. It is famous for its world-renowned wines and
its rich cultural heritage The region is made up of four departments:
Yonne, Cote-d'Or, Saone-et-Loire,
and Nievre. Today, the region is known for its production
of world-renowned wines and high-quality food products such as Dijon
mustard. In more recent times, the region has undergone significant
economic and demographic changes, with a decline in traditional
industries such as agriculture and manufacturing, and a rise in the
service sector. As of 2012, the population of the Bourgogne region was
estimated to be around 1.6 million people, with Dijon being
the largest city and the regional capital.The region is home to numerous
historical landmarks, such as the Abbaye de Fontenay and the Hospices de
Beaune, which attract many visitors every year.
The Franche-Comte region is located in the eastern part of
France, bordering Switzerland. It is made up of four departments:
Doubs, Jura, Haute-Saone, and
Territoire de Belfort. Historically, the region was known
for its watchmaking industry and its strategic location between France
and Switzerland. However, in the 20th century, the region experienced
significant economic decline as traditional industries such as
agriculture and manufacturing struggled to compete in a globalized
market. In recent years, the region has attempted to revitalize its
economy by focusing on high-tech industries such as microelectronics and
nanotechnology, as well as promoting tourism and cultural heritage
sites. As of 2012, the population of Franche-Comte was estimated to be
around 1.2 million people, with Besancon being the largest
city and the regional capital. The region is known for its beautiful
natural landscapes, including the Jura Mountains, and its rich cultural
heritage, which includes the historic city of Besancon and numerous
museums and art galleries.
The demographic study of Bourgogne and Franche-Comté will involve an analysis of its individual “communes,” with two separate analyses to be conducted. All data will be presented as percentages.
Person belonging to an age class:
Gender of the residents:
The level of Education:
Employment Status:
Housing type of residents:
Living in secondary housing (resid_sec)
Living in HLM (hlm)
Living in a home (maison)
Living in a appartment (appart)
Housing status of residents:
Variables names index in R code
Using the dataset, we will look to do an analysis of the relationship of the mentioned variables and have insight of the economical status of the people of Bourgogne and Franche comté as follows:
What is the relationship between the housing type of the residents and their education and employment status?
Does the gender of a person is correlated to their type of occupation and education level for each region?
Does living in a property type for a person is influenced by their gender, occupation status or education?
As a first step, we will create the 3 new variables that is missing for our analysis and use it along with the chosen variables as follows:
has_dip = pop_dipl_bepc + pop_dipl_capbep + pop_dipl_bac + pop_dipl_bac2 + pop_dipl_suppop_30_74 = pop_tot - pop_0_14 - pop_15_29 - pop_75ppop_hommes=pop_tot - pop_femmes| Column_Name | Description |
|---|---|
| INSEE_COM | INSEE code |
| commune | Commune |
| code_region | Postal code of the region. |
| region | Name of the region |
| code_departement | Code of the department |
| departement | Name of the department |
| pop_tot | Total Population |
| pop_cl | Population Class |
| pop_0_14 | Population aged 0-14 years |
| pop_15_29 | Population aged 15-29 years |
| pop_18_24 | Population between 18-24 years |
| pop_75p | Population aged 75 years and over |
| pop_femmes | Population of Women |
| pop_act_15p | Population having a main property |
| pop_chom | Population with unemployed individuals |
| pop_agric | Population working in agriculture |
| pop_indep | Population of Independent employment |
| pop_cadres | Population working as managers or executives |
| pop_interm | Population working in intermediate professions |
| pop_empl | Population Employed (salariés) |
| pop_ouvr | Manual workers |
| pop_scol_18_24 | Population aged between 18-24 pursuing an education |
| pop_non_scol_15p | Population not in school aged 15 |
| pop_dipl_aucun | Population having no diploma |
| pop_dipl_bepc | Population having a BEPC diploma (a French diploma obtained after completing lower secondary education) |
| pop_dipl_capbep | Population having a CAP or BEP diploma (a French vocational diploma obtained after completing secondary |
| education) | |
| pop_dipl_bac | Population having a Baccalaureate diploma (a French diploma obtained after completing upper secondary education) |
| pop_dipl_bac2 | Population having a 2-year post-Baccalaureate diploma (a French diploma obtained after completing 2 years of higher education after the Baccalaureate). |
| pop_dipl_sup | Population with a higher education diploma (more than 2 years of higher education) |
| log_rp | Population having a main residential property |
| log_proprio | Population owning their own home |
| log_loc | Population renting their home |
| log_hlm | Population living in HLM (French social housing program) |
| log_sec | Population living in social housing |
| log_maison | Population living in a house |
| log_appart | Population living in an apartment |
| age_0_14 | Percentage of population aged 0-14 years |
| age_15_29 | Percentage of population aged 15-29 years |
| age_75p | Percentage of population aged 75 years and over |
| femmes | Percentage of female population |
| chom | Percentage of unemployed population |
| agric | Percentage of population working in agriculture |
| indep | Percentage of population working as independent workers |
| cadres | Percentage of population working as managers or executives |
| interm | Percentage of population working in intermediate |
| professions | |
| empl | Percentage of population working as employees (salariés) |
| ouvr | Percentage of population working as manual workers |
| etud | Percentage of population studying |
| dipl_aucun | Percentage of population with no diploma |
| dipl_bepc | Percentage of population with a BEPC diploma (a French |
| diploma obtained after completing lower secondary education) | |
| dipl_capbep | Percentage of population with a CAP or BEP diploma (a French vocational diploma obtained after |
| completing secondary education) | |
| dipl_bac | Percentage of population with a Baccalaureate diploma (a French diploma obtained after completing upper secondary education) |
| dipl_bac2 | Percentage of population with a 2-year post-Baccalaureate diploma (a French diploma obtained after completing 2 years of higher education after the Baccalaureate) |
| dipl_sup | Percentage of population with a higher education diploma (more than 2 years of higher education) |
| resid_sec | Percentage of population living in social housing |
| proprio | Percentage of population owning a home |
| locataire | Percentage of population renting a home |
| hlm | Percentage of population living in HLM (French social housing program) |
| maison | Percentage of population living in a house |
| appart | Percentage of population living in an apartment |
We will proceed to extract some descriptive statistics from our data for the regions of Bourgogne and Franche Comté.
In the following map, we can have an overview of the concentration of the population.
From the bar plot above, we can see a higher number of residents census data from Dijon, Chalon-Sur-Saône and Nevers from the Bourgogne region. From the Franche-Comté region, we have a high number of residents data from Besançon and Belfort.
age_sex_long <- age_sex %>%
gather(key = "age_group", value = "population", pop_0_14, pop_15_29, pop_30_74, pop_75p) %>%
gather(key = "gender", value = "count", Hommes, Femmes)
ggplot(age_sex_long, aes(x = region, y = count, fill = age_group)) +
geom_bar(stat = "identity", position = "stack") +
facet_wrap(~ gender, ncol = 2, scales = "free_y") +
labs(title = "Population by Age Group, Region, and Gender", x = "Region", y = "Population") +
scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73", "#F0E442")) +
theme_minimal()
From then bar chart above, we can see a larger number of mens than women in both regions. We can also notice of the proportion of age groups of different gender is almost distributed with the same proportions with their respective totals in both Bourgogne and Franche Comté.
From both the bar charts, we see the level of unemployment is higher in Franche comté than in Bourgogne. When we observe the unemployment per departments for those two regions, Territoire de Belfort, Yonne, Doubs and Haute-Saône has the highest level of unemployment.
G_age <- ggplot(data = bg_fc, mapping = aes(x = departement, y = dipl_aucun, col = region)) +
geom_boxplot() + facet_grid() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
G_age
We can see higher percentages of individuals without a diploma qualification in haut Saône and Yonne.
| region | perc_propio | perc_locataire |
|---|---|---|
| Bourgogne | 23.99 | 22.73 |
| Franche-Comte | 22.15 | 23.92 |
We can see a higher percentage of people that renting a property to live than owning one in the Franche comté region. This is the opposite for the Bourgogne region that have a relatively higher percentage of people owning their home .
| region | perc_sec | perc_maison | perc_appart |
|---|---|---|---|
| Bourgogne | 1.38 | 21.45 | 25.86 |
| Franche-Comte | 1.20 | 17.46 | 29.15 |
We can see a higher percentage of individuals living in an appartment in both the region. However, there is only a small percentage of people living in social housing in both region.
We got some insights of the data of both the Bourgogne and Franche-Comté. We have seen some differences between the regions and the variables do not show the same insight. Now, we are going to study the dataset of the 2 regions (Bourgogne and Franche-Comté) to see if there exists relationships between the variables.
The correlation plot shows that darker colors indicate higher correlation, and the orientation of the ellipse indicates the sign of the correlation. We can observe the following relationship between our variables:
People aged 15-29 is positively correlated with having a diploma, renting a house, living in HLM and apartments, and negatively correlated with owning a house. Age 30-74 has the opposite behavior. And age 75 and plus, positively correlated with women and negatively with men.
opposite behaviours of the genders can be observed.We can see positive correlation of renting, living in HLM and appartments while owning a property and a house have a negative correlation. Conversely, the men category shows the opposite behavior.
The unemployment category (chom) is positively correlated with manual workers, has diploma, renting, living in HLM and apartments, and negatively correlated with owning a house and living in a house.
The cadre category is positively correlated with intermediate jobs and negatively correlated with manual workers and no diploma category.
The education category (no diploma and diploma) is negatively correlated with owning a house and positively correlated with renting, living in HLM and apartments.
The housing category shows that owning a house is positively correlated with owning a property and negatively correlated with living in HLM and apartments. Renting and living in HLM show the opposite behavior.
We will now have a different correlation plot for each region.
We can observe similar correlations for the Bourgogne region compared with the correlation of both the regions.
The correlation plot for the Franche-Comte region reveals the following differences:
The age category of 75 years and above shows a strong positive correlation with women and a negative correlation with men.
The gender category shows a similar trend but with weaker correlation than the last correlation plots.
The housing type behavior in the gender category does not exhibit a significant correlation.
The type of employment category exhibits a similar trend, with stronger correlation as indicated by darker colors, particularly with respect to the type of housing.
We have seen that many variables of the data are correlated. We will use the Principal Component Analysis reduction method lower-dimensional representation while preserving the majority of the original data’s variability of the dataset. We will have a PCA analysis for Bourgogne and Franche comté using variables of the category population age group, gender, education level, employment status, housing type and housing status.
We do a PCA on a on a set of 18 variables extracted from the census
data rp2012. These variables capture different aspects,
such as gender (male or female), education
(has diploma or no diploma) , employment
(unemployed, agriculture,
independent, managers, interm and
ouvr workers), housing (owner,
rent, HLM, house,
apartment, resid_sec), and age groups
(15-29, 30-74 and >75). Below
is the analysis of both the region together.
In order to determine the most suitable PCA analysis to conduct in this case, we examine the variance of our variables.
| age_15_29 | age_30_74_pct | age_75p | femmes | pop_hommes_pct | chom | agric | indep | cadres | interm | ouvr | dipl_aucun | has_dip | resid_sec | proprio | locataire | hlm | maison | appart | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age_15_29 | 10.10 | -7.15 | -3.47 | -0.70 | 0.70 | 3.66 | -0.52 | -1.58 | -0.12 | -1.70 | 2.87 | 2.77 | 13004.37 | -0.75 | -23.82 | 23.48 | 10.62 | -41.56 | 41.25 |
| age_30_74_pct | -7.15 | 10.46 | -2.66 | -1.87 | 1.87 | -7.02 | 0.17 | 0.59 | 3.23 | 6.00 | -6.98 | -7.63 | -8647.80 | -0.83 | 29.60 | -28.75 | -13.93 | 41.20 | -40.41 |
| age_75p | -3.47 | -2.66 | 12.29 | 4.23 | -4.23 | 4.34 | 0.44 | 1.62 | -2.38 | -4.89 | 0.96 | 5.29 | -783.95 | 2.11 | -13.28 | 12.41 | 4.72 | -9.28 | 8.32 |
| femmes | -0.70 | -1.87 | 4.23 | 3.50 | -3.50 | 2.88 | -0.09 | 0.33 | 0.44 | -0.49 | -1.72 | 2.26 | 2482.91 | 0.29 | -10.13 | 9.81 | 6.53 | -12.26 | 11.89 |
| pop_hommes_pct | 0.70 | 1.87 | -4.23 | -3.50 | 3.50 | -2.88 | 0.09 | -0.33 | -0.44 | 0.49 | 1.72 | -2.26 | -2482.91 | -0.29 | 10.13 | -9.81 | -6.53 | 12.26 | -11.89 |
| chom | 3.66 | -7.02 | 4.34 | 2.88 | -2.88 | 21.37 | -1.33 | -2.38 | -8.46 | -11.39 | 18.34 | 21.44 | 6695.98 | -1.10 | -43.58 | 43.18 | 35.59 | -55.07 | 54.25 |
| agric | -0.52 | 0.17 | 0.44 | -0.09 | 0.09 | -1.33 | 1.44 | 0.37 | -1.01 | -1.13 | 1.15 | -0.62 | -1291.93 | 1.13 | 2.47 | -2.69 | -2.41 | 6.13 | -6.17 |
| indep | -1.58 | 0.59 | 1.62 | 0.33 | -0.33 | -2.38 | 0.37 | 2.97 | -0.21 | -0.36 | -3.90 | -2.26 | -2181.17 | 2.19 | 4.33 | -4.66 | -6.70 | 10.27 | -10.41 |
| cadres | -0.12 | 3.23 | -2.38 | 0.44 | -0.44 | -8.46 | -1.01 | -0.21 | 22.66 | 14.57 | -29.40 | -16.63 | 8744.30 | -3.58 | 17.64 | -17.29 | -13.80 | 4.64 | -3.91 |
| interm | -1.70 | 6.00 | -4.89 | -0.49 | 0.49 | -11.39 | -1.13 | -0.36 | 14.57 | 23.53 | -29.94 | -18.77 | 3119.05 | -2.78 | 25.48 | -24.82 | -13.60 | 25.68 | -25.12 |
| ouvr | 2.87 | -6.98 | 0.96 | -1.72 | 1.72 | 18.34 | 1.15 | -3.90 | -29.40 | -29.94 | 75.30 | 34.68 | -10515.02 | 4.40 | -34.94 | 34.99 | 26.81 | -43.28 | 42.88 |
| dipl_aucun | 2.77 | -7.63 | 5.29 | 2.26 | -2.26 | 21.44 | -0.62 | -2.26 | -16.63 | -18.77 | 34.68 | 37.21 | -62.86 | -0.51 | -43.34 | 42.74 | 36.21 | -51.99 | 50.97 |
| has_dip | 13004.37 | -8647.80 | -783.95 | 2482.91 | -2482.91 | 6695.98 | -1291.93 | -2181.17 | 8744.30 | 3119.05 | -10515.02 | -62.86 | 59208832.20 | -2014.91 | -37456.81 | 37064.48 | 15990.84 | -73541.41 | 73149.91 |
| resid_sec | -0.75 | -0.83 | 2.11 | 0.29 | -0.29 | -1.10 | 1.13 | 2.19 | -3.58 | -2.78 | 4.40 | -0.51 | -2014.91 | 18.81 | -3.08 | 1.87 | -3.49 | -1.52 | 1.05 |
| proprio | -23.82 | 29.60 | -13.28 | -10.13 | 10.13 | -43.58 | 2.47 | 4.33 | 17.64 | 25.48 | -34.94 | -43.34 | -37456.81 | -3.08 | 175.31 | -171.74 | -108.09 | 234.69 | -230.77 |
| locataire | 23.48 | -28.75 | 12.41 | 9.81 | -9.81 | 43.18 | -2.69 | -4.66 | -17.29 | -24.82 | 34.99 | 42.74 | 37064.48 | 1.87 | -171.74 | 169.05 | 107.43 | -230.77 | 226.99 |
| hlm | 10.62 | -13.93 | 4.72 | 6.53 | -6.53 | 35.59 | -2.41 | -6.70 | -13.80 | -13.60 | 26.81 | 36.21 | 15990.84 | -3.49 | -108.09 | 107.43 | 104.92 | -141.72 | 140.02 |
| maison | -41.56 | 41.20 | -9.28 | -12.26 | 12.26 | -55.07 | 6.13 | 10.27 | 4.64 | 25.68 | -43.28 | -51.99 | -73541.41 | -1.52 | 234.69 | -230.77 | -141.72 | 401.52 | -397.92 |
| appart | 41.25 | -40.41 | 8.32 | 11.89 | -11.89 | 54.25 | -6.17 | -10.41 | -3.91 | -25.12 | 42.88 | 50.97 | 73149.91 | 1.05 | -230.77 | 226.99 | 140.02 | -397.92 | 395.38 |
It appears that the variances of the variables exhibit notable
differences in importance (for example in the variable
has_dip). In order to address this problem and ensure
accurate analysis, we will implement standardized PCA to scale the
variances appropriately.
kable(PCA_sd$eig[1:4,], digits = 3, format = "markdown")
| eigenvalue | percentage of variance | cumulative percentage of variance | |
|---|---|---|---|
| comp 1 | 7.367 | 38.771 | 38.771 |
| comp 2 | 2.895 | 15.239 | 54.011 |
| comp 3 | 2.657 | 13.986 | 67.997 |
| comp 4 | 1.500 | 7.894 | 75.891 |
fviz_eig(PCA_sd)
The PCA reduced the dimension of the dataset and we will keep 3 axes
which explains explain 75.89% of the variance of the data.
We will carry out our analysis with 3 axes.
kable(PCA_sd$var$coord[,1:3], digits = 3, format = "markdown")
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 0.555 | -0.415 | -0.501 |
| age_30_74_pct | -0.739 | 0.097 | 0.019 |
| age_75p | 0.303 | 0.240 | 0.772 |
| femmes | 0.447 | -0.125 | 0.825 |
| pop_hommes_pct | -0.447 | 0.125 | -0.825 |
| chom | 0.818 | 0.201 | -0.011 |
| agric | -0.187 | 0.379 | 0.119 |
| indep | -0.291 | 0.180 | 0.471 |
| cadres | -0.348 | -0.778 | 0.157 |
| interm | -0.507 | -0.682 | 0.080 |
| ouvr | 0.448 | 0.695 | -0.343 |
| dipl_aucun | 0.687 | 0.498 | -0.085 |
| has_dip | 0.383 | -0.576 | -0.067 |
| resid_sec | 0.016 | 0.254 | 0.172 |
| proprio | -0.950 | 0.096 | -0.025 |
| locataire | 0.951 | -0.101 | 0.010 |
| hlm | 0.815 | -0.021 | -0.045 |
| maison | -0.887 | 0.308 | 0.095 |
| appart | 0.881 | -0.313 | -0.105 |
Component Summaries
PCA- Dim.1
The age group 30-74, unemployed individuals (chom), owning a house (propio), rent (locataire), living in a house (maison) or apartment (appart), and hlm are the categories that are better represented by the first principal components. Additionally, the category with no diploma (dipl_aucun) is also well represented.
PCA - Dim.2
The cadres are the best represented in this dimesion. Followed by interm, manual workers (ouvr).
PCA - Dim.3
The age group \(>75\), women, men are the better represented in this dimension
kable(PCA_sd$var$contrib[,1:3], digits = 3, format = "markdown")
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 4.181 | 5.943 | 9.450 |
| age_30_74_pct | 7.409 | 0.325 | 0.014 |
| age_75p | 1.246 | 1.984 | 22.412 |
| femmes | 2.710 | 0.542 | 25.616 |
| pop_hommes_pct | 2.710 | 0.542 | 25.616 |
| chom | 9.077 | 1.395 | 0.005 |
| agric | 0.473 | 4.965 | 0.529 |
| indep | 1.152 | 1.117 | 8.343 |
| cadres | 1.648 | 20.898 | 0.924 |
| interm | 3.490 | 16.051 | 0.241 |
| ouvr | 2.730 | 16.677 | 4.434 |
| dipl_aucun | 6.406 | 8.549 | 0.272 |
| has_dip | 1.992 | 11.444 | 0.169 |
| resid_sec | 0.004 | 2.226 | 1.119 |
| proprio | 12.239 | 0.319 | 0.023 |
| locataire | 12.287 | 0.356 | 0.004 |
| hlm | 9.016 | 0.015 | 0.075 |
| maison | 10.684 | 3.269 | 0.343 |
| appart | 10.547 | 3.383 | 0.412 |
fviz_pca_contrib(PCA_sd, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(PCA_sd, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib() which
## can handle outputs of PCA, CA and MCA functions.
According to the contribution plot, the individuals who are tenants
(locataire), property owners (propio), living
in a house or apartment, living in subsidized housing
(HLM), and unemployed have made the greatest contribution
to the data.
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 0.308 | 0.172 | 0.251 |
| age_30_74_pct | 0.546 | 0.009 | 0.000 |
| age_75p | 0.092 | 0.057 | 0.596 |
| femmes | 0.200 | 0.016 | 0.681 |
| pop_hommes_pct | 0.200 | 0.016 | 0.681 |
| chom | 0.669 | 0.040 | 0.000 |
| agric | 0.035 | 0.144 | 0.014 |
| indep | 0.085 | 0.032 | 0.222 |
| cadres | 0.121 | 0.605 | 0.025 |
| interm | 0.257 | 0.465 | 0.006 |
| ouvr | 0.201 | 0.483 | 0.118 |
| dipl_aucun | 0.472 | 0.248 | 0.007 |
| has_dip | 0.147 | 0.331 | 0.004 |
| resid_sec | 0.000 | 0.064 | 0.030 |
| proprio | 0.902 | 0.009 | 0.001 |
| locataire | 0.905 | 0.010 | 0.000 |
| hlm | 0.664 | 0.000 | 0.002 |
| maison | 0.787 | 0.095 | 0.009 |
| appart | 0.777 | 0.098 | 0.011 |
The age groups of 30-74 and 15-29 are better represented in dimensions 1 and 2. Similarly, the categories of education attainment (diploma and no diploma), job types (cadres, intermediate, manual workers, and unemployed), and housing types (living in a house, owning a house, tenants, apartment, and HLM) are also well-represented in these dimensions.
From our previous component summary for dimension 1 and 2, we have that The age group 30-74, unemployed individuals (chom), owning a house (propio), tenants (locataire), living in a house (maison) or apartment (appart), and hlm are the categories that are better represented by the first principal components. Additionally, the category with no diploma (dipl_aucun) is also well represented.
We observe the following variables to interpret the first axe:
The level of Education:
Employment Status:
Population age group:
Housing Status of residents:
On the opposite direction:
*The second axe:
Intepretation of the first axe:
We can see that a manual worker (ouvr) has a close correlation of not having a diploma.
We also see that people who lives in hlm residences have a correlation with being unemployed and inversely correlated of having a house and being the owner.
People aged 30 to 74 years old are correlated of having a house and being the owner.
A person who rents a property is correlated with living in an appartment and inversely correlated of owing in a house as owner.
Intepretation of the second axe:
People aged 15_29 is correlated of having at least one diploma and inversely correlated with older people of working as interim and be an executive.
A person is inversely correlated of working as an executive (cadre) with a manual worker (ouvr).
125 (Branges - BG) and 142 (Gergy - BG) shows the communes having people correlated with owning a home between 30 and 74 years old. The communes 138 (Le Creusot - BG) and 179 (Sens - BG) show an inverse relationship of renting an appartment for younger people aged 15 and 29 years old.
We can see that individuals aged 15-29, 30-74, and >75 are well-represented in terms of age groups. Those without a diploma are more strongly represented in education. Housing types such as maison, propio, locataire appart, and hlm are well-represented, as are the unemployed. Additionally, both women and men are better represented in this dimension.
We can see the communes of code 191 (Évette-Salbert - FC), 27
(Varois-et-Chaignot - BG), 98 (
Marzy -BG) and 101 (Saint-Éloi -BG) having a relationship with people
owning a home aged between 30 and 74 years old. 85 (Salins-les-Bains BG)
and 88 (Château-Chinon (Ville) -FC). 52 (Montbéliard - FC) and 63
(Seloncourt - FC) have people renting a home and unemployed correlated
with those communes.
We see a more accurate representation of both women and men across a range of categories in this dimension, including those aged 15-29 and >75, as well as cadres, intermediate workers, those with diplomas, and manual workers.
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
We can see 66 (Valdahon - FC), 118 (Saint-Sauveur - FC) and 161 (Varennes-le-Grand - BG) having more men in those communes. 9 (Fontaine-lès-Dijon - BG) and 20 (Saint-Apollinaire - BG) show more people that are employed as executives.
According to the gap statistics method, it is recommended that we choose two clusters for the combined dataset of the Bourgogne and Franche-Compte regions. In this case by the K-means method, the size of the clusters are 102 and 93.
kmeans_clusters1 <- kmeans(PCA_sd$ind$coord[,1:3], centers = 2)
#kable(kmeans_clusters1$cluster, format = "markdown")
cat("The size of the clusers 1 and 2 are", "\n", kmeans_clusters1$size)
## The size of the clusers 1 and 2 are
## 93 102
## Warning: ggrepel: 60 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
3D Plot cluster by the K-means
We use the hclust function with the ward.D2 method to generate the dendrogram, selecting 3 as the number of clusters.
##
## The communes of the First cluster are:
## Arc-sur-Tille Beaune Châtillon-sur-Seine Chenôve Chevigny-Saint-Sauveur Dijon Fontaine-lès-Dijon Gevrey-Chambertin Is-sur-Tille Longvic Marsannay-la-Côte Mirebeau-sur-Bèze Montbard Nuits-Saint-Georges Plombières-lès-Dijon Quetigny Saint-Apollinaire Saulieu Semur-en-Auxois Sennecey-lès-Dijon Seurre Talant Varois-et-Chaignot Venarey-les-Laumes Avanne-Aveney Baume-les-Dames Bavans Besançon Doubs École-Valentin Étupes Exincourt Franois L'Isle-sur-le-Doubs Mathay Miserey-Salines Montbéliard Montferrand-le-Château Morteau Ornans Pirey Pontarlier Pont-de-Roide Roche-lez-Beaupré Saint-Vit Saône Seloncourt Thise Vieux-Charmont Voujeaucourt Arbois Champagnole Damparis Dole Foucherans Lons-le-Saunier Montmorot Morbier Poligny Saint-Amour Saint-Claude Salins-les-Bains Tavaux La Charité-sur-Loire Château-Chinon (Ville) Clamecy Cosne-Cours-sur-Loire Coulanges-lès-Nevers Decize Fourchambault Guérigny Marzy Nevers Pougues-les-Eaux Saint-Éloi Saint-Pierre-le-Moûtier Varennes-Vauzelles Arc-lès-Gray Échenoz-la-Méline Gray Lure Luxeuil-les-Bains Rioz Vaivre-et-Montoille Vesoul Autun Bourbon-Lancy Le Breuil Buxy Chagny Chalon-sur-Saône Charnay-lès-Mâcon Charolles Châtenoy-le-Royal Chauffailles Cluny Le Creusot Crissey Digoin Épinac Givry Gueugnon Louhans Mâcon Montceau-les-Mines Montcenis Montchanin Paray-le-Monial Saint-Marcel Saint-Rémy Saint-Vallier Sanvignes-les-Mines Sennecey-le-Grand Tournus Appoigny Auxerre Avallon Chevannes Joigny Monéteau Paron Pont-sur-Yonne Saint-Clément Saint-Georges-sur-Baulche Saint-Julien-du-Sault Sens Tonnerre Toucy Villeneuve-sur-Yonne Bavilliers Belfort Châtenois-les-Forges Danjoutin Essert Évette-Salbert Giromagny Offemont Valdoie
##
## The communes of the Second cluster are:
## Auxonne Brazey-en-Plaine Genlis Selongey Audincourt Bethoncourt Charquemont Fesches-le-Châtel Les Fins Grand-Charmont Hérimoncourt Villers-le-Lac Levier Maîche Mandeure Le Russey Sochaux Valdahon Valentigney Moirans-en-Montagne Morez Les Rousses Saint-Lupicin Garchizy Imphy La Machine Saint-Léger-des-Vignes Champagney Fougerolles Héricourt Noidans-lès-Vesoul Port-sur-Saône Ronchamp Saint-Loup-sur-Semouse Saint-Sauveur Blanzy Branges Champforgeuil La Chapelle-de-Guinchay Ciry-le-Noble Crêches-sur-Saône Gergy Ouroux-sur-Saône Saint-Germain-du-Plain Sornay Torcy Varennes-le-Grand Brienon-sur-Armançon Chablis Champigny Cheny Migennes Saint-Florentin Villeneuve-la-Guyard Beaucourt Delle Grandvillars
Cluster Analysis:
We conducted a cluster analysis using both 2 and 3 clusters, as determined by the gap statistics which measures the mean of the communes within clusters. However, when we used 3 clusters, we found significant overlap between the clusters, which led us to ultimately choose to continue our analysis with 2 clusters.
Once we have applied two clustering techniques, k-means and dendrogram, we can compare the results of both methods to gain a better understanding of the data structure and identify any potential patterns or relationships between the variables.
Although the k-means clustering method resulted in many intersections, making it difficult to interpret, the dendrogram method provided a better separation of the communes, leading to a clearer splitting of the communes for further interpretation. Therefore, we may find that the dendrogram method is more useful for analyzing this particular dataset.
After obtaining the two clusters using the dendrogram method, we can add them to the map of the Bourgogne and Franche-Comte regions to visually compare the two clusters in terms of demographic variables. This could help us to identify any potential relationships between the clusters and demographic factors
We will now perform a PCA for each region to compare how our findings of the analysis of both regions differs if they are analysed separately.
kable(PCA_sd_bg$eig[1:3,], digits = 3, format = "markdown")
| eigenvalue | percentage of variance | cumulative percentage of variance | |
|---|---|---|---|
| comp 1 | 7.820 | 41.159 | 41.159 |
| comp 2 | 3.592 | 18.905 | 60.063 |
| comp 3 | 2.267 | 11.931 | 71.994 |
fviz_eig(PCA_sd_bg)
We will keep 3 axes also that explains 72% of the
variance for this region for our analysis.
We can see from the contribution plot that the locataire, proprio, people living in a house and appartment or hlm, unemployed has the most contribution to the data.
kable(PCA_sd_bg$var$coord[,1:3], digits = 3, format = "markdown")
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 0.570 | -0.513 | -0.374 |
| age_30_74_pct | -0.799 | 0.036 | -0.032 |
| age_75p | 0.432 | 0.332 | 0.653 |
| femmes | 0.648 | -0.031 | 0.647 |
| pop_hommes_pct | -0.648 | 0.031 | -0.647 |
| chom | 0.824 | 0.233 | -0.125 |
| agric | -0.129 | 0.420 | 0.018 |
| indep | -0.216 | 0.370 | 0.504 |
| cadres | -0.302 | -0.751 | 0.386 |
| interm | -0.460 | -0.720 | 0.119 |
| ouvr | 0.386 | 0.749 | -0.384 |
| dipl_aucun | 0.670 | 0.493 | -0.285 |
| has_dip | 0.364 | -0.543 | -0.042 |
| resid_sec | 0.119 | 0.592 | 0.370 |
| proprio | -0.955 | 0.066 | 0.004 |
| locataire | 0.956 | -0.073 | -0.023 |
| hlm | 0.830 | -0.101 | -0.248 |
| maison | -0.877 | 0.381 | -0.001 |
| appart | 0.870 | -0.389 | -0.007 |
Component Summaries
First Principal Component Analysis - Dim.1
We can see that variables that are better represented are age_30_74_pct, chom, proprio, locataire, hlm, maison, appart
Second Principal Component Analysis - Dim.2
We can see here that the best represented variables are cadre, interm and manual workers (ouvr).
Third Principal Component Analysis - Dim.3
Women, Men and people aged 75 + are well represented.
kable(PCA_sd_bg$var$contrib[,1:3], digits = 3, format = "markdown")
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 4.152 | 7.322 | 6.184 |
| age_30_74_pct | 8.158 | 0.037 | 0.046 |
| age_75p | 2.384 | 3.066 | 18.832 |
| femmes | 5.375 | 0.026 | 18.442 |
| pop_hommes_pct | 5.375 | 0.026 | 18.442 |
| chom | 8.688 | 1.509 | 0.689 |
| agric | 0.213 | 4.906 | 0.015 |
| indep | 0.595 | 3.817 | 11.197 |
| cadres | 1.168 | 15.690 | 6.586 |
| interm | 2.705 | 14.435 | 0.624 |
| ouvr | 1.904 | 15.637 | 6.512 |
| dipl_aucun | 5.734 | 6.762 | 3.582 |
| has_dip | 1.690 | 8.196 | 0.078 |
| resid_sec | 0.180 | 9.745 | 6.033 |
| proprio | 11.665 | 0.121 | 0.001 |
| locataire | 11.690 | 0.148 | 0.023 |
| hlm | 8.802 | 0.286 | 2.711 |
| maison | 9.843 | 4.050 | 0.000 |
| appart | 9.679 | 4.221 | 0.002 |
fviz_pca_contrib(PCA_sd_bg, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(PCA_sd_bg, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib() which
## can handle outputs of PCA, CA and MCA functions.
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 0.325 | 0.263 | 0.140 |
| age_30_74_pct | 0.638 | 0.001 | 0.001 |
| age_75p | 0.186 | 0.110 | 0.427 |
| femmes | 0.420 | 0.001 | 0.418 |
| pop_hommes_pct | 0.420 | 0.001 | 0.418 |
| chom | 0.679 | 0.054 | 0.016 |
| agric | 0.017 | 0.176 | 0.000 |
| indep | 0.047 | 0.137 | 0.254 |
| cadres | 0.091 | 0.564 | 0.149 |
| interm | 0.212 | 0.519 | 0.014 |
| ouvr | 0.149 | 0.562 | 0.148 |
| dipl_aucun | 0.448 | 0.243 | 0.081 |
| has_dip | 0.132 | 0.294 | 0.002 |
| resid_sec | 0.014 | 0.350 | 0.137 |
| proprio | 0.912 | 0.004 | 0.000 |
| locataire | 0.914 | 0.005 | 0.001 |
| hlm | 0.688 | 0.010 | 0.061 |
| maison | 0.770 | 0.145 | 0.000 |
| appart | 0.757 | 0.152 | 0.000 |
We observe the following variables to interpret the first axe:
Employment Status:
Population age group:
Housing Status of residents:
The level of Education:
On the opposite direction:
The second axe:
Intepretation of the first axe:
We see people owning a house correlated with the age group 30-75. On the otherhand, it is inversely correlated of renting an appartment and belong to a younger age group 15-29.
People renting a hlm housing are correlated with being in unemployment and without at least a diploma.
People not having at least a diploma is correlated with being manual workers (ouvr).
We get the same observation when comparing both the regions at once.
Intepretation of the second axe:
Younger people aged 15-29 are correlated of having a diploma and renting an appartment.
People who are executives (cadres) are correlated of having temporary jobs and inversely correlated of not having a diploma.
We see a some communes that correlated with people of owning on the
first axe and correlated with people in hlm housing and in unemployment.
We also see a lot of communes having people doing manual jobs (ouvr)
with no diploma. We can notice a very small amount of cadres like the
commune 7 (Chevigny-Saint-Sauveur), 9 (Fontaine-lès-Dijon) and 20
(Saint-Apollinaire) having people with employment as executives and on
interim jobs.
Intepretation of the first axe:
We see a more accurate representation cadres and manual workers inversely correlated.
Intepretation of the second axe:
#clusters
set.seed(123)
kmeans_clusters_bg <- kmeans(PCA_bourg$ind$coord[,1:3], centers = 3)
# Plot clusters with labels
p_bg <- fviz_cluster(kmeans_clusters_bg , data = PCA_bourg$ind$coord[,1:3],
geom = "point", ellipse.type = "norm") +
geom_text_repel(aes(label = rownames(PCA_bourg$ind$coord[,1:3])),
fontface = "bold", size = 3)
p_bg
## Warning: ggrepel: 9 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
# Filter bourg_dat to select only the communes in Bourgogne
bourg_dat_bg <- bourg_dat[bourg_dat$region == "Bourgogne", ]
# Print the selected communes
#bourg_dat_bg
Multifac_dat_bg <- PCA_bourg$ind$coord[,1:3]
# Perform hierarchical clustering
hc_bg <- hclust(dist(scale(Multifac_dat_bg)), method = "ward.D2")
# Cut the dendrogram into 3 clusters
cut_ward_bg <- cutree(hc_bg, k = 3)
# Set the commune names as labels for the dendrogram nodes
labels(hc_bg) <- bourg_dat_bg$commune
# Plot the dendrogram with cluster borders and commune names
plot(hc_bg, hang = -1, main = "Hierarchical Clustering of Communes (Bourgogne)")
rect.hclust(hc_bg, k = 3, border = 2:4)
abline(h = 3, col = "red")
##
## The comunnes of the first cluster
## Arc-sur-Tille Fontaine-lès-Dijon Marsannay-la-Côte Mirebeau-sur-Bèze Saint-Apollinaire Sennecey-lès-Dijon Talant Varois-et-Chaignot Coulanges-lès-Nevers Marzy Pougues-les-Eaux Saint-Éloi Varennes-Vauzelles Le Breuil Buxy Charnay-lès-Mâcon Châtenoy-le-Royal Crêches-sur-Saône Crissey Givry Montcenis Saint-Rémy Saint-Vallier Appoigny Chevannes Monéteau Saint-Clément Saint-Georges-sur-Baulche
##
## The comunnes of the second cluster
## Auxonne Beaune Chenôve Chevigny-Saint-Sauveur Dijon Genlis Gevrey-Chambertin Is-sur-Tille Longvic Montbard Nuits-Saint-Georges Plombières-lès-Dijon Quetigny Semur-en-Auxois Venarey-les-Laumes Fourchambault Imphy Nevers Autun Chagny Chalon-sur-Saône Champforgeuil Cluny Le Creusot Mâcon Montceau-les-Mines Montchanin Saint-Marcel Torcy Varennes-le-Grand Auxerre Avallon Brienon-sur-Armançon Cheny Joigny Migennes Paron Saint-Florentin Sens Tonnerre
##
## The comunnes of the third cluster
## Brazey-en-Plaine Châtillon-sur-Seine Saulieu Selongey Seurre La Charité-sur-Loire Château-Chinon (Ville) Clamecy Cosne-Cours-sur-Loire Decize Garchizy Guérigny La Machine Saint-Léger-des-Vignes Saint-Pierre-le-Moûtier Blanzy Bourbon-Lancy Branges La Chapelle-de-Guinchay Charolles Chauffailles Ciry-le-Noble Digoin Épinac Gergy Gueugnon Louhans Ouroux-sur-Saône Paray-le-Monial Saint-Germain-du-Plain Sanvignes-les-Mines Sennecey-le-Grand Sornay Tournus Chablis Champigny Pont-sur-Yonne Saint-Julien-du-Sault Toucy Villeneuve-la-Guyard Villeneuve-sur-Yonne
The communes of the Franche-Comté region of France were organized in a hierarchical system until 2012. At the top of this system were the départements, or administrative divisions, of Doubs, Jura, Haute-Saône, and Territoire de Belfort. These départements were then divided into arrondissements, which were further divided into cantons. At the lowest level were the communes, which were the smallest administrative divisions in the region.
The communes of Franche-Comté varied in size and population, ranging from small rural villages to larger urban centers. In 2012, there were a total of 1,178 communes in the region. But in this data set we study 86 commnunes.
kable(PCA_sd$eig[1:3,], digits = 3, format = "markdown")
| eigenvalue | percentage of variance | cumulative percentage of variance | |
|---|---|---|---|
| comp 1 | 7.367 | 38.771 | 38.771 |
| comp 2 | 2.895 | 15.239 | 54.011 |
| comp 3 | 2.657 | 13.986 | 67.997 |
fviz_eig(PCA_sd)
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 0.281 | 0.078 | 0.375 |
| age_30_74_pct | 0.433 | 0.001 | 0.042 |
| age_75p | 0.041 | 0.347 | 0.290 |
| femmes | 0.065 | 0.778 | 0.074 |
| pop_hommes_pct | 0.065 | 0.778 | 0.074 |
| chom | 0.667 | 0.000 | 0.034 |
| agric | 0.069 | 0.038 | 0.022 |
| indep | 0.126 | 0.082 | 0.000 |
| cadres | 0.196 | 0.228 | 0.255 |
| interm | 0.349 | 0.167 | 0.186 |
| ouvr | 0.270 | 0.306 | 0.231 |
| dipl_aucun | 0.501 | 0.019 | 0.226 |
| has_dip | 0.174 | 0.058 | 0.373 |
| resid_sec | 0.000 | 0.016 | 0.018 |
| proprio | 0.881 | 0.008 | 0.028 |
| locataire | 0.885 | 0.008 | 0.028 |
| hlm | 0.652 | 0.004 | 0.002 |
| maison | 0.829 | 0.004 | 0.056 |
| appart | 0.824 | 0.004 | 0.055 |
The PCA reduced the dimension of the data set and we keep 3 axes
which explains 67% of the variance of the data. We will
carry our analysis with 3 axes.
kable(PCA_fc.sd$var$coord[,1:3], digits = 3, format = "markdown")
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 0.530 | -0.280 | 0.613 |
| age_30_74_pct | -0.658 | -0.030 | -0.206 |
| age_75p | 0.203 | 0.589 | -0.539 |
| femmes | 0.254 | 0.882 | -0.272 |
| pop_hommes_pct | -0.254 | -0.882 | 0.272 |
| chom | 0.817 | 0.012 | -0.185 |
| agric | -0.262 | -0.195 | -0.148 |
| indep | -0.355 | 0.286 | -0.005 |
| cadres | -0.442 | 0.477 | 0.505 |
| interm | -0.591 | 0.409 | 0.432 |
| ouvr | 0.520 | -0.553 | -0.480 |
| dipl_aucun | 0.708 | -0.138 | -0.475 |
| has_dip | 0.418 | 0.242 | 0.611 |
| resid_sec | -0.013 | -0.127 | 0.134 |
| proprio | -0.939 | -0.092 | -0.169 |
| locataire | 0.941 | 0.092 | 0.168 |
| hlm | 0.808 | 0.062 | -0.043 |
| maison | -0.910 | -0.063 | -0.236 |
| appart | 0.908 | 0.062 | 0.234 |
Component Summaries
First Principal Component Analysis - Dim.1
The most well-represented groups consist of individuals who reside in a house, apartment, or HLM, as well as those who are unemployed or renting. Regarding the age gropup the 15-29 and 30-74 are the well represented represented. In terms of employment, the categories of managers (cadres), intermediate workers (interm), and manual workers (ouvr) are prevalent. Furthermore, individuals with no diploma qualifications make up a significant portion of the represented population.
Second Principal Component Analysis - Dim.2
In the second axis, we observe that both genders (male and female) are well represented. The age group over 75 years old is also represented significantly, followed by manual workers, managers, and temporary employed individuals (interm).
Third Principal Component Analysis - Dim.3
People aged between 15-29 years and those over 75 years of age are well represented. People employed as executive (cadres), having temporary and manual jobs, no diploma and having at least one diploma are also well represented.
kable(PCA_fc.sd$var$contrib[,1:3], digits = 3, format = "markdown")
| Dim.1 | Dim.2 | Dim.3 | |
|---|---|---|---|
| age_15_29 | 3.841 | 2.674 | 15.846 |
| age_30_74_pct | 5.924 | 0.030 | 1.791 |
| age_75p | 0.563 | 11.876 | 12.253 |
| femmes | 0.886 | 26.590 | 3.118 |
| pop_hommes_pct | 0.886 | 26.590 | 3.118 |
| chom | 9.124 | 0.005 | 1.441 |
| agric | 0.942 | 1.302 | 0.929 |
| indep | 1.726 | 2.788 | 0.001 |
| cadres | 2.678 | 7.782 | 10.747 |
| interm | 4.773 | 5.710 | 7.873 |
| ouvr | 3.701 | 10.472 | 9.735 |
| dipl_aucun | 6.856 | 0.655 | 9.528 |
| has_dip | 2.386 | 1.999 | 15.737 |
| resid_sec | 0.002 | 0.555 | 0.758 |
| proprio | 12.062 | 0.289 | 1.202 |
| locataire | 12.106 | 0.288 | 1.198 |
| hlm | 8.926 | 0.132 | 0.078 |
| maison | 11.340 | 0.135 | 2.343 |
| appart | 11.280 | 0.131 | 2.305 |
fviz_pca_contrib(PCA_fc.sd, choice = "var", axes = 1)
## Warning in fviz_pca_contrib(PCA_fc.sd, choice = "var", axes = 1): The function
## fviz_pca_contrib() is deprecated. Please use the function fviz_contrib() which
## can handle outputs of PCA, CA and MCA functions.
According to the contribution plot, the individuals who are tenants (locataire), property owners (propio), living in a house or apartment, living in subsidized housing (HLM), unemployed, no diploma and the age group 30-74 have made the greatest contribution to the data.
Intepretation of the first axe
We have almost the same interpretation as the region of Bourgogne but stronger correlation of renting an appartment and having no diploma and being unemployed.
We see a less number of men that are executives and have a more concentration of women having executive jobs in more communes.
The correlation of renting an appart and hlm housing are more closely correlated.
Intepretation of the second axe
People renting an appartment are the most represented in communes 24 (Montbéliard), 47 (Lons-le-Saunier), 51 (Morez), 74 (Vesoul). We see a good representation of people owning a home in 2(Avanne-Aveney), 22(Mathay), 25 (Montferrand-le-Château), 60 (Champagney) and 78 (Châtenois-les-Forges).
We can see that there are more women aged 75+ from the communes
2(
Avanne-Aveney), 3 (Baume-les-Dames) and 25(Montferrand-le-Château) and
women aged 75+ in 57 (Salins-les-Bains) and 63 (Gray).
#clusters
set.seed(123)
kmeans_clusters_FC <- kmeans(PCA_fc.sd$ind$coord[,1:3], centers = 2)
# Plot clusters with labels
p_FC <- fviz_cluster(kmeans_clusters_FC, data = PCA_fc.sd$ind$coord[,1:3],
geom = "point", ellipse.type = "norm") +
geom_text_repel(aes(label = rownames(PCA_fc.sd$ind$coord[,1:3])),
fontface = "bold", size = 3)
p_FC
## Warning: ggrepel: 11 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
The interpretation of the k-means method with 2 or 3 clusters is difficult because the ellipses all intersect each other, making it challenging to distinguish the clusters.
However with the k-means we do not have a good spliting in the clusterting in the dendogram we have a better representation.
# Filter FC_dat to select only the communes in Franche-Comté
FC_dat_fc <- FC_dat[FC_dat$region == "Franche-Comte", ]
Multifac_dat_fc <- PCA_fc.sd$ind$coord[,1:3]
# Extract the first three principal components from PCA_fc.sd
Multifac_dat_fc <- PCA_fc.sd$ind$coord[,1:3]
hc_fc <- hclust(dist(scale(Multifac_dat_fc)), method = "ward.D2")
cut_ward_fc <- cutree(hc_fc, k = 3)
# Set the commune names as labels for the dendrogram nodes
labels(hc_fc) <- FC_dat_fc$commune
# Plot the dendrogram with cluster borders and commune names
plot(hc_fc, hang = -1, main = "Hierarchical Clustering of Communes (Bourgogne)")
rect.hclust(hc_fc, k = 3, border = 2:4)
abline(h = 3, col = "red")
##
## The comunnes of the first cluster
## Audincourt Bethoncourt Charquemont Fesches-le-Châtel Les Fins Grand-Charmont Hérimoncourt L'Isle-sur-le-Doubs Villers-le-Lac Levier Maîche Mandeure Morteau Pontarlier Pont-de-Roide Le Russey Sochaux Valdahon Valentigney Moirans-en-Montagne Morez Les Rousses Saint-Claude Saint-Lupicin Fougerolles Héricourt Noidans-lès-Vesoul Port-sur-Saône Ronchamp Saint-Loup-sur-Semouse Saint-Sauveur Beaucourt Delle Giromagny Grandvillars Offemont
##
## The comunnes of the second cluster
## Avanne-Aveney Baume-les-Dames Bavans Doubs École-Valentin Étupes Exincourt Franois Mathay Miserey-Salines Montferrand-le-Château Ornans Pirey Roche-lez-Beaupré Saint-Vit Saône Seloncourt Thise Vieux-Charmont Voujeaucourt Arbois Damparis Foucherans Montmorot Morbier Saint-Amour Tavaux Arc-lès-Gray Champagney Échenoz-la-Méline Rioz Vaivre-et-Montoille Châtenois-les-Forges Essert Évette-Salbert
##
## The comunnes of the third cluster
## Besançon Montbéliard Champagnole Dole Lons-le-Saunier Poligny Salins-les-Bains Gray Lure Luxeuil-les-Bains Vesoul Bavilliers Belfort Danjoutin Valdoie
# Calculate means of variables for each cluster
means_by_cluster <- aggregate(FC_dat_fc[, c("femmes", "pop_hommes_pct", "chom", "agric", "indep", "cadres")],
by = list(cluster = cut_ward_fc),
FUN = mean)
means_by_cluster
## cluster femmes pop_hommes_pct chom agric indep cadres
## 1 1 50.32449 49.67551 14.991999 0.5656880 4.276999 7.331141
## 2 2 51.50741 48.49259 9.756432 0.4434357 5.707986 12.902751
## 3 3 53.24588 46.75412 17.526271 0.3746760 5.037789 11.502063
# Calculate medians of variables for each cluster
medians_by_cluster <- aggregate(FC_dat_fc[, c("femmes", "pop_hommes_pct", "chom", "agric", "indep", "cadres")],
by = list(cluster = cut_ward_fc),
FUN = median)
medians_by_cluster
## cluster femmes pop_hommes_pct chom agric indep cadres
## 1 1 50.80316 49.19684 15.497079 0.06922811 4.288935 6.961646
## 2 2 51.36276 48.63724 9.272965 0.36697248 5.746626 11.799410
## 3 3 53.26149 46.73851 17.072148 0.16778523 4.447614 10.918801
age_15_29, age_30_74_pct, age_75p, femmes, pop_hommes_pct, chom, agric, indep, cadres, interm, ouvr, dipl_aucun ,has_dip, resid_sec, proprio, locataire, hlm , maison, appart
# Create box plots of variables by cluster
par(mfrow = c(1,3))
boxplot(femmes ~ cluster, data = means_by_cluster,
main = "Population Density by Cluster (Mean)")
boxplot(pop_hommes_pct ~ cluster, data = means_by_cluster,
main = "Median Income by Cluster (Mean)")
boxplot(agric ~ cluster, data = means_by_cluster,
main = "Internet Access by Cluster (Mean)")
By comparing age, gender, occupation, education, and type of housing, we can analyze the cluster means to obtain a more comprehensive understanding of the three clusters obtained for the Franche Compte region.
| cluster | femmes | pop_hommes_pct | chom | agric | indep | cadres | age_15_29 | age_30_74_pct | age_75p | interm | ouvr | dipl_aucun | has_dip | resid_sec | proprio | locataire | hlm | maison | appart |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 50.32449 | 49.67551 | 14.991999 | 0.5656880 | 4.276999 | 7.331141 | 18.23375 | 54.03734 | 9.438438 | 19.78441 | 40.78336 | 22.94525 | 2366.778 | 3.277629 | 56.29185 | 41.75355 | 18.443625 | 53.48484 | 45.87418 |
| 2 | 51.50741 | 48.49259 | 9.756432 | 0.4434357 | 5.707986 | 12.902752 | 15.52355 | 56.25988 | 10.382173 | 26.36782 | 26.07571 | 15.29675 | 1627.714 | 1.985429 | 69.21894 | 29.16096 | 9.151561 | 73.38042 | 26.16768 |
| 3 | 53.24588 | 46.75412 | 17.526271 | 0.3746760 | 5.037789 | 11.502064 | 19.84538 | 51.47729 | 12.329923 | 22.47357 | 28.59805 | 21.96802 | 9875.467 | 2.332520 | 42.94616 | 54.70610 | 24.068914 | 34.79207 | 64.03049 |
| cluster | femmes | pop_hommes_pct | chom | agric | indep | cadres | age_15_29 | age_30_74_pct | age_75p | interm | ouvr | dipl_aucun | has_dip | resid_sec | proprio | locataire | hlm | maison | appart |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 50.80316 | 49.19684 | 15.497079 | 0.0692281 | 4.288935 | 6.961646 | 17.34233 | 54.67033 | 9.148917 | 19.71087 | 41.56923 | 21.92019 | 1820.5 | 1.6352560 | 56.37730 | 42.20348 | 16.752136 | 56.13110 | 43.51384 |
| 2 | 51.36276 | 48.63724 | 9.272965 | 0.3669725 | 5.746626 | 11.799410 | 15.33164 | 55.64313 | 9.642744 | 26.18619 | 26.66667 | 14.77573 | 1463.0 | 0.8849558 | 68.75000 | 30.14706 | 8.998647 | 72.65569 | 26.73943 |
| 3 | 53.26149 | 46.73851 | 17.072148 | 0.1677852 | 4.447614 | 10.918801 | 19.14943 | 51.57610 | 11.644501 | 22.10014 | 27.97943 | 21.27419 | 3706.0 | 1.5268079 | 44.75975 | 52.36295 | 22.730414 | 38.69637 | 60.55597 |
We can observe that there is a significant variation in the female population across the clusters, with the lowest number of women in the first cluster and the highest in the third cluster. In contrast, the male populations show the opposite pattern, with the highest number of men in the first cluster and the lowest in the third cluster.
It is noticeable that the first cluster is dominated by younger individuals (aged 15-29), while the age range of 30-74 and >75 follows in increasing order.
In contrast, the second cluster has the lowest number of young people and the highest number of individuals aged between 30-74.
The third cluster, on the other hand, has the highest concentration of young individuals (aged 15-29) as well as old individuals (>75).
We can observe that the first cluster is predominantly composed of individuals working in agriculture and manual labor, while having the lowest proportion of managers and manual workers. This observation is consistent with the results of the PCA, which showed an inverse correlation between these two types of occupations.
Among the clusters, the second one stands out with the largest number of managers, while agriculture and manual workers make up the next two largest groups, respectively.
The third cluster has a notable population of managers, ranking just behind the second cluster, and is followed by manual workers and agriculture in that order.
In Cluster 1 we see the same proportion in each category.
The PCA analysis supports the finding that cluster 2 has a greater proportion of homeowners and a smaller proportion of renters compared to the other clusters.
The majority of people in cluster 3 live in apartments, while the number of homeowners and renters is comparatively low.
The first cluster is primarily made up of younger individuals, with ages ranging from 15 to 29 years old. The proportion of individuals in the age ranges of 30-74 and over 75 years old increases in ascending order. Additionally, the first cluster stands out with a higher number of individuals holding higher degrees compared to the other clusters. However, in terms of occupation, the first cluster is largely comprised of individuals working in agriculture and manual labor, indicating a potential disparity between education and employment opportunities in this group.
The analysis reveals that the second cluster is distinct from the others in several ways. Firstly, it has a lower number of young people and a higher number of individuals aged between 30-74, indicating a potential difference in the distribution of age groups across the clusters. Additionally, the second cluster has a greater proportion of managers compared to the other clusters, followed by agriculture and manual workers. This suggests that the occupational composition of this group is markedly different from that of the other clusters. Moreover, the second cluster is characterized by a higher proportion of homeowners and a lower proportion of renters compared to the other clusters, highlighting potential differences in the housing arrangements of these groups
The third cluster displays distinct demographic and occupational characteristics. It is distinguished by having the highest concentration of young and old individuals and a notable population of managers, with manual workers and agriculture following in that order. Additionally, the third cluster has the largest proportion of individuals without a higher degree. Housing-wise, the majority of people in this cluster live in apartments, with a comparatively lower number of homeowners and renters
We deduced from our analysis a manual worker is closely correlated of not having a diploma.
People being unemployed is correlated of them living in hlm residences and inversely correlated with people having a job and owning a house.
People aged 30 to 74 years old are correlated of having a house and being the owner. Those who rent an appartment and inversely correlated of owning a house
Communes of Branges, Gergy from Bourgogne shows people aged between 30 and 74 years old having a home.
The communes from Bourgogne have more people owning a home aged 30-74 years old. However, we see more communes from Franche comté renting a home and unemployed.
More people are employed as executives in the Bourgogne region.
A higher correlation of people renting and living in appartment and hlm housing in Franche comté.
Older womens aged 75+ lives in different communes in Franche comté than Bourgogne.
The type of occupation of a person is correlated to their level of education and the type of jobs. Individuals with no diploma is more closely correlated to professions requiring manual work. Similarly a person having at least a diploma is more correlated with an executive.
Overall, we saw that the employment status is correlated in both region and the housing type residents rents or own and this is different per communes for both regions. We saw more women having executives jobs in Franche comté than men. Additionally, executives in both region are correlated to interim jobs in both regions.
A more in depth analysis of the communes of both region to investigate which type of communes exibit similar economics conditions. Eventually, more data will be needed. For example: Salary, Tax payments, financial background and field of work could be influencing factors.